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Abstract. A new approach for optimal estimation of Markov chains with 
sparse transition matrices is presented. 
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1. Mathematical Framework 

We begin with a formal mathematical definition of a Markov chain: 

Definition 1.1. Let n and d be elements of N, such that n>l and rf > 1. Define 
= {1, . . . , d}. Consider a sequence of random variables {Xi,X2, ■ ■ ■ , Xn} such 
that 



(1.1) 



is independent of k for all i and j in Q. Then the sequence {^i, ■ ■ ■ ? is a 
Markov chain with state space and transition probabilities Pij for i and j in fl. 

It follows from this definition that a Markov chain with known probability distri- 
bution of the initial state is completely characterized by a d x d matrix containing 
the transition probabilities Pij , 



P = 



Pll Pl2 
P2I P22 



Pld 
P2d 



Pdl Pd2 ■ ■ ■ Pdd 
1 



2 
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This matrix is called the transition probability matrix. Since the elements of row 
i of this matrix represent the conditional probabilities for all possible state changes 
from state i, they must satisfy 



(1.2) 



d 



1, 



for all I € il. For a Markov chain with known transition probability matrix, the 
most likely state as n ^ oo can be calculated as follows. Define a vector Vk so that 
the i^^ element of Vk is the unconditional probability that the Markov chain is in 
state i at time k. Hence, {Vk)i = P{Xk = i), where = [(Vfe)i, . . . , {Vk)d]- 

The probability (14+i)i = P{Xk+i = i) can be related to the vector Vk using 
the Law of Total Probability, 



d 



{Vk+i)i = P{Xk+i = = 2^ PiXk = j)P{Xk+i = i\Xk = j) 



d 



PJ^ ■ (14),. 



Hence V^+i = P'V^. One can then use an inductive argument to establish that 
Vfe+i = (P')'^Vi. Here Vi is a vector of probabilities corresponding to the distribu- 
tion of the initial state of the Markov chain. Hence 

P{X,=j) = {V)j 

for j = l,...,d. 

The limiting, or steady state, probabilities, if they exist, are then given by 
(1.3) H« = lim [{P')'']-Vl^. 



Since [H^'^l,- = V lim [(PThkSik = Hm [{PThh it follows that [H(')1' = [U^\. ..,11 

k=l 

is the i*'' row of P„ = Um P". 

n— ^oo 

Under certain conditions [14], the limit will exist and the rows of will be 

identical. We will denote one of these rows as H. The elements of H correspond to 
the long-range probabilities that the Markov chain is in each of the states. In some 
instances H can be found analytically. 

Example: Consider a Markov chain with transition probability matrix 



P = 



,2 ,2 
1—a 1+Q 
2 2 



where < a < 1. A simple induction arguments shows that 



tWi 



pn 



■ 

1-V 



l-o" 
l+a" 



for all integers n > 1. Since < a < 1, lim a" = 0, so the limit = lim P" 

n— *oo n— *oo 

exists and the rows of P^ are identical: 



Pn 



1 1 

f f 

2 2 
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2. Estimation of the Transition Probability Matrix 

In most practical cases, the transition probability matrix is unknown and it must 
then be estimated based on the observations. Let Xi, X2, ■ ■ ■ , Xn be n consecutive 
observations from a Markov chain. The maximum likelihood estimator of the matrix 
P, which we will denote as P, is defined as follows [3]: 

1) : For each state i e il, let rij be the number of times that state i is observed 
in Xi,X2, . . . , Xn^i. 

2) : If rij = (the state is not represented in the chain, except maybe for the 
last position), then we formally define all probabilities of transition from 
the state i to any state j ^ i to be 0, Pij = 0, for every j 7^ i. Therefore, 
by (2), we have Pa = 1. 

3) : If Tij > 0, let riij be the number of observed consecutive transitions from 
state i to state j in Xi,X2, . . . , Xn- In this case, Pij = for j = 1, . . . , d. 

Note that the final observed state of the chain is not coimtcd in Step 1 because 
we do not observe any transitions from this state. Hence, we only observe n — 1 
transitions. Note also that the estimate P is a valid transition probability matrix. 

Since the transition probability matrix has (P elements, it is natural to rewrite 
P as a column vector with elements [1] : 

Pii 
P12 



Py = vec{P) 



Pid 
Pdi 
Pdd 

which allows us to concentrate on properties of the random vector P„. This vector 
has (P elements, labeled by a two-digit index. For instance, P^ is the element found 
on the row k = j -\- {i — l)d of the vector vec{P) : Pij = {Pv)k- 

The properties of the maximum likelihood estimator P have been studied exten- 
sively [1] . In particular, P can be shown to be asymptotically normal and consistent. 
The limiting probabilities computed from P are also consistent estimates of the true 
limiting probabilities. These results are presented in the two theorems below. 

Let Pn be the maximum likelihood estimator corresponding to n observations 
Xi, . . . , Xn from a Markov chain with transition probability matrix P. Let {Pv)n 
and Py be the vector forms of P„ and P, respectively. The following theorem 
describes the asymptotic properties of the vector (Py)„ as n — > 00. 

Theorem 2.1. As n ^ 00, 



(2.1) 

where T,p is given by 



{Pv)n-Pv ^N{0,Ep), 



(2.2) 



i^p){ij,kl) = ^ikPij{5jl - Pil) 
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Here, Sp is a square cP x matrix. The matrix element displayed corresponds to 
the row j + {i — l)d and the column 1 + {k — l)d. 

Now assume that for all integers n > 0, the limit lim [P„]™ exists and has all 

m — ^oo 

rows identical. Denote by n„ and 11 the steady-state probabilities corresponding 
to Pn and P, respectively. The following theorem establishes the consistency of the 
estimates of steady-state probabilities. 

Theorem 2.2. For all i, (n„)j — > (n)^, with probability 1, as n ^ oo, where (n„)i 
and (n) j are the i*^ elements of n„ and H respectively. 

These results provide an asymptotic justification of the use of P to estimate 
P. When the sample size is not sufficiently large, the asymptotic results given 
in previous results may not hold. In these cases, the bootstrap method, which is 
outlined in the next section, can be used to find approximate results corresponding 
to those given above. 

3. The Bootstrap Method 

Let X be a random variable with distribution function F and let X = (a;i , . . . , Xn)' 
be an observed sample from F. Suppose -R(X, F) is a statistical quantity that de- 
pends in general on both the unknown distribution F and on the sample X. For 
example, ii(X, F) could be an estimator of an unknown parameter. If F is un- 
known, then the exact distribution of the random variable ii(X, i^) is generally 
unknown. 

In 1979, Efron [5] proposed the bootstrap method to nonparametrically estimate 
the distribution of i?(X, F). The method consists of the following three steps: 

(i) : From the observed sample X, use the empirical distribution function, F„, 
as an estimate of the probability function F. The empirical distribution 
function is defined by Fn{x) = where n{x) is the number of values Xi 
in X that are less than or equal to x. 

(ii) : Draw B samples of size n from F„ conditional on X. Denote these as 
X*,fori = l,...,B. 

(iii) : For each sample X*, compute R* = i?(X*,F„) and approximate the 
distribution of R(K, F) with the empirical distribution oi Rl, . . . , R,g. 

The samples X| , . . . , X^ are called resamples and the empirical distribution of 
i?*, . . . , Rq is called the bootstrap estimate of the distribution of R, or simply the 
bootstrap distribution of R* . 

The bootstrap principle states that the empirical distribution of Rl, . . . , Rg is 
a good approximation to the true distribution of i?(X, F). Several authors have 
proven that the approximation is asymptotically valid for a large number of statis- 
tics of interest, and underlying populations, under some regularity conditions. See 
[4] and [6]. 

In [11], Kulperger and Prakasa Rao studied the applicability of the bootstrap 
method to the problem of estimating properties of Markov chains. Working un- 
der certain assumptions, they proved the following Central Limit Theorem for the 
bootstrap maximum likelihood estimator matrices. 

Let Xi , . . . , Xn be n observations from a Markov chain with transition proba- 
bility matrix P and let Pn be the maximum likelihood estimator of P computed 
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on the sample. Generate a bootstrap chain, X^, . . . ,X*, by generating a Markov 
chain with transition probability matrix P„, conditional on Xi, . . . ,X„. Denote 

the maximum likelihood estimator for the bootstrap chain by P*. Let {Py)n and 

{Pv)n be the vector forms of P* and Pn, respectively. 

Theorem 3.1. There is a sequence iV„ S N, such that 



(3.1) 
as n - 
(3.2) 



DO and Nr, 



{Pv)n {Pv)n 

oo, where 

{'^P)(,ij,kl) — ^ikPiji^jl - Pi 



7V(0,Sp), 



This result indicates that the distribution of the bootstrap maximum likelihood 
estimator has similar asymptotic behavior as the distribution of the maximum 
likelihood estimator. 



4. The Bootstrap Method for Finite State Markov Chains 

When applied to the problem of estimating Markov chains, the bootstrap method 
consists of computing P from the original chain, and then generating B additional 
samples based on P. A uniform probability distribution for the initial state is used. 
For each of these resamples, a maximum likelihood estimator P* , i = !,...,_£? is 
computed. Based on the vector sample {Py )i, ■ ■ ■ , {P*)b, estimators for E{Py) and 
Cov{Pv) can be computed as follows: 

E{K) = ^Y.(Pv)k, 
k=i 

, B 
k=l 

where Cov{Py ) is a square matrix of dimension x d'^. 

The empirical distribution function for each element {Pv)ij of the vector Py can 
also be computed, based on the sample [{Py)i]ij, • ■ • , [{Pv)B]ij- Denote this function 
by Fij. A (1 — a)100% confidence interval based on the percentile method of Efron 
(1979) (see also Reference [7]) for the element {Pv)ij is given by [F^^{a), F^j^{l — 
a)]. Here, xl = [Fij]~^{a) is the largest value of x such that the number of 
elements in the sample [{Py)i]ij, ■ ■ ■ , [{Pv)B]ij that are less than x is smaller than 
an. Likewise, xjj = [Fij]~^{l — a) is the smallest value of x such that the number 
of elements in the sample [{Py)i]ij, ■ ■ ■ , [{Py)B]ij that are smaller than x is larger 
than (1 — a)n. Specifically, 

Xl = max |a; : (F„)y (a;) < q;| , xu = min ^x : {Fn)ij{x) > 1 — • 

The bootstrap procedure may not perform well in some circumstances. For 
example, under certain conditions, the matrix P may not have a structure that is 
close to that of P. To illustrate one of these situations, we consider the following 
numerical example. 

Example: Let the true transition probability matrix of a Markov chain be 



(p;)fe - E{^) 
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Table 1. The ten samples generated using the transition matrix in (8) 



Sample Number Generated Sample 



1 


3, 4, 2, 4, 3, 4, 


3, 


4, 4, 


1 


2 


2, 2, 1, 4, 4, 4, 


1, 


1,4, 


3 


3 


3, 2, 4, 3, 4, 2, 


2, 


4, 3, 


4 


4 


2, 4, 4, 4, 2, 4, 


4, 


2,4, 


3 


5 


3, 2, 2, 4, 3, 4, 


4, 


4, 3, 


4 


6 


4, 4, 3, 4, 3, 4, 


4, 


3, 4, 


4 


7 


2, 2, 4, 4, 2, 4, 


2, 


3, 4, 


4 


8 


2, 3, 4, 3, 3, 3, 


4, 


1,4, 


2 


9 


2, 4, 4, 1, 2, 3, 


4, 


4, 2, 


3 


10 


1, 1, 4, 4, 1, 3, 


4, 


4, 4, 


4 



0.25 0.25 0.25 0.25 

0.10 0.20 0.20 0.50 

0.05 0.10 0.10 0.75 ■ 

0.10 0.20 0.30 0.40 _ 

Using the C code listed in Appendix A, we generated samples of length n = 10 from 
this transition matrix, using an initial distribution of Vi = (0.25,0.25,0.25,0.25)'. 
Ten such samples are listed in Table 1. 

The first sample leads to the following maximum likelihood estimator P: 

' 1.00 0.00 0.00 0.00 " 

g_ 0.00 0.00 0.00 1.00 

0.00 0.00 0.00 1.00 

0.20 0.20 0.40 0.20 

Note that the estimate P is significantly different from the original matrix P. The 
main difference is that P is sparse (has many null entries), while P is not. There- 
fore, many valid transitions will never occur in resamples based on the matrix P. 
Regardless of how many bootstrap resamples we use, the fact that all the bootstrap 
maximum likelihood estimators P* are sparse may cause the bootstrap method to 
give unreliable results. 

Computing maximum likelihood estimators from the other samples generated 
from P leads again to sparse estimators, though they may differ from the one listed 
above. This is because the sample size chosen is relatively small compared to the 
total number of possible transitions (n = 10, for (P = 16). A maximum of only 
60% of all transitions will be found in a given sample. 

Another situation that leads to sparse estimators occurs when the matrix P has 
elements with small probabilities. In this case, it is the existence of rare transitions 
(corresponding to the small probabilities) that causes the problem. For instance, if 
we use the matrix P as the true P matrix, we obtain the samples listed in Table 2. 



(4.1) 
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Sample Number Generated Sample 



1 


3, 4, 1, 1, 1, 1, 


1, 


1, 


1, 


1 


2 


2, 4, 1, 1, 1, 1, 


1, 


1, 


1, 


1 


3 


3, 4, 3, 4, 3, 4, 


1, 


1, 


1, 


1 


4 


2, 4, 3, 4, 1, 1, 


1, 


1, 


1, 


1 


5 


3, 4, 1, 1, 1, 1, 


1, 


1, 


1, 


1 


6 


4, 4, 2, 4, 3, 4, 


3, 


4, 


2, 


4 


7 


2. 4, 4, 4, L 1, 


1, 


1, 


1, 


1 


8 


2, 4, 4, 3, 4, 2, 


4, 


1, 


1, 


1 


9 


2, 4, 4, 1, 1, 1, 


1, 


1, 


1, 


1 


10 


1, 1, 1, 1, 1, 1, 


1, 


1, 


1, 


1 



The maximum likelihood estimator of the first sample is: 

1.00 0.00 0.00 0.00 " 
0.00 1.00 0.00 0.00 

0.00 0.00 0.00 1.00 ■ 

1.00 0.00 0.00 0.00 

As indicated earlier, increasing the number of samples docs not help, since all the 
estimators will be sparse. To avoid this from happening, one should use a non-sparse 
matrix to generate the resamples. 

Next, wc will describe a way of solving this problem, by smoothing the maximum 
likelihood estimators. This procedure replaces a sparse estimator by a modified 
version where all of the entries are positive. 

5. Smoothed Estimators 

As indicated in the previous section, a problem related to estimating the tran- 
sition probability matrix from observed sample chains is the possibility that some 
states of the system are too rare to occur in a limited experiment. A similar result 
is obtained wlwui the chain length, n, is small compared to the total niunlxT of 
possible transitions, cP . In this case only a fraction of all the possible transitions 
will be present in any given sample. When this happens, a particular transition 
may not be observed in the sample, even though the probability of this transition 
occuring is greater than 0. 

When a sparse estimator P is obtained from the initial chain, the impact on the 
bootstrap method is significant. If we assume that = for some i and j, then 
a transition from state i to state j will never be observed in any of the resamples, 
even though it may be possible in the actual Markov chain. A similar problem 
occurs in the case of using the bootstrap on independent discrete data. In [9] and 
[13], the authors exhibit several examples where sparse data causes the bootstrap 
to perform poorly. 

One solution to this problem is to increase the sample size. When a larger 
sample size is not feasible, the following method can be used. Since the cause of 
the problem is the fact that P is sparse, we can attempt to generate the bootstrap 



P = 
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resamples based on a slightly different matrix, whose entries are all positive. We 
call this matrix the smoothed version of P and denote it by P. It is given by 

(5.1) PiJ = -,[PiJ+n-% 

where 

d d d 

uj = Y^[Pi^ + n-"] = Pij + ^ n-" = 1 + n-«d, 

3=1 j=l j=l 

and u > is a positive smoothing parameter. 

The form of this smoothed matrix is based on simple smoothers that arc used 
for multinomial distributions. See, for example, [8] and [16]. Note that from the 
definition, we obtain 

Pij = ^ ~^ — = I for all i = 1, . . . ,d, 

so that P is a valid transition probability matrix. 

The choice of the smoothing parameter u presents some difficulty. It is technically 
possible to specify a performance criterion for P in terms of some measure of the 
performance of the resulting bootstrapping method. The parameter u could then be 
chosen to optimize this criterion. However, it is unlikely that such a method would 
be feasible in practice, and is well beyond the scope of this study. Nevertheless, 
we will justify some general properties that u should follow. These will ensure that 
the smoothing does not asymptotically affect the behavior of the generated Markov 
chains. 

The criterion we choose is to select the smoothing parameter such that P is a 
consistent estimator of P at the same rate as P. 

6. Asymptotic Properties of Smoothed Estimators 

In the following, we consider n observations Xi, X2, . . . , X„ from a Markov chain 
and establish the asymptotic properties of the smoothed estimator of the transition 
probability matrix. We begin by proving some general properties. 

In order to study the asymptotic properties of estimators, we must introduce the 
following equivalence relation for matrices. 

Let {Pn\ and {P„} be two sequences ofdxd matrices, for n = 1, 2, Suppose 

there is an r > such that the sequence rf{En)ij = rf{Pn — Rn)ij has the property 
that it remains bounded as n ^ 00 for all i, j = 1, . . . , d. Then as n ^ 00 

(6.1) Pn = Ra + 0{n-'). 

Here, 0(n^^) represents any sequence of matrices properly bounded. 

The following theorem describes the asymptotic consistency property of the 
smoothed estimator defined earlier. 

Theorem 6.1. Suppose P = P + 0{n~^) as n ^ 00 for some k > Q. Then 
P = P + 0{n~'') as n —> 00 as long as u> k. 
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Proof: 

Consider the function f (x) = {1 + x)~^ . A Taylor expansion of / around a; = is 

f{x) = 1 + 0{x) as a; ^ 0. 
We can rewrite u)~^ = f{n~"d) so that 

(6.2) o;-^ = 1 + 0(n-"), as n ^ oo, 

since n~"d remains bounded as n — > oo for fixed integer d> 1. 
Now computing 

n"[n-"a;-i] = 0;"^ = 1 + 0(n-"), 
which by definition [15] remains bounded as n ^ 00, so 

(6.3) n-"a;-i = ©(n""). 
In matrix notation, this result can be rewritten as 

(6.4) P = uj-^P + n-"Lu~^J, 

where J is a d x d matrix with all entries equal to 1. We conclude that: 

(6.5) P = P + anP + bnJ, 

where the sequences n"a„ and n"6„ remain bounded as n — > 00. Then for all 
i,j — 1, . . . ,d, < Pij < 1 and Jy = 1, so n'^[anPij + bnJij] remains bounded as 
n 00. Therefore, 

(6.6) P = P + 0(n-"), asn^oo. 
Since P = P + 0{n~''), we can write 

(6.7) P = P + A^ + Br^, 

where n''{An)ij and n^{Bn)ij remain bounded as n — !■ 00. Then for all k < u, 
iBn)ij] remains bounded as n ^ 00, so, 

(6.8) P = P + 0(n-*=), 
as long as fc < M. 

As shown in [1] and [3], the exponent k is usually equal to 0.5. Therefore, 
any choice of u such that u > 0.5 will ensure that P„ preserves the asymptotic 
consistency property of P„. 

7. Performance of Smoothed Estimators 

To compare the performance of the smoothed and unsmoothed estimators, we 
present two Examples. 

Example: In this example we explore the behavior of the bootstrap bias esti- 
mator using P and P. We use the transition probability matrix from the example 
given in (8). The true probability matrix is: 
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0.25 
0.10 
0.05 
0.10 



0.25 
0.20 
0.10 
0.20 



0.25 
0.20 
0.10 
0.30 



0.25 
0.50 
0.75 
0.40 



Using a chain generated from P, with uniform distribution probability for the initial 
state, the following maximum likelihood estimator is computed: 



P = 



0.111111 
0.142857 
0.000000 
0.122449 



0.222222 
0.142857 
0.037037 
0.183673 



0.222222 
0.357143 
0.185185 
0.285714 



0.444444 
0.357143 
0.777778 
0.408163 



With smoothing parameter u = 0.5 the smoothed maximum likelihood estimator 



is 



0.150794 0.230159 

0.173469 0.173469 

0.071429 0.097884 

0.158892 0.202624 



0.230159 0.388889 

0.326531 0.326531 

0.203704 0.626984 

0.275510 0.362974 



After applying the bootstrap method, with B = 1000, the average estimator com- 
puted from the samples based on the unsmoothed matrix is found to be 



P = 



0.099799 0.220662 

0.145533 0.139222 

0.000000 0.035484 

0.121927 0.180267 



0.233852 0.445688 

0.359566 0.355678 

0.177676 0.786840 

0.287282 0.410525 



The average computed from the sample based on the smoothed estimator is given 

by 



0.171886 0.240630 

0.196326 0.191570 

0.122615 0.141866 

0.183719 0.214623 



0.241419 0.346065 

0.307840 0.304264 

0.212862 0.522657 

0.270342 0.331316 



As we can see, the smoothed estimator contains some information about the 
low-probability transitions of the system, while the standard maximum likelihood 
estimator does not. In particular, the element corresponding to the transition 
3 — > 1, which has the lowest probability for this chain, is strictly zero in the average 
maximum likelihood estimator, but not in the smoothed version. Since the average 
is computed from non-negative numbers, it follows that {P*)3i = for all the 
resamples based on P. The bootstrap method based on P leads to the conclusion 
that the transition 3 ^ 1 is not allowed in this chain. 

The bootstrap method based on P does not lead to the same conclusion, as all 

the elements of P are positive. While {P)3i is not very close to the true value 0.05, 
the confidence interval for this element predicted by the bootstrap method based 
on P may have good coverage properties. The same conclusion holds for other 
statistical inference quantities. A simulation study of the coverage properties of 
the bootstrap confidence intervals is presented in the next section. 
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Sample Size 




/ — / 1 > 


-P) 






-0.353553 


-0.353553 


-0.353553 


1.060660 


n = 50 


0.303046 


0.606092 


-1.414214 


0.505076 


-0.353553 


-0.707107 


0.380750 


0.679910 




0.176777 


—0.235702 


0.530330 


—0.471404 




-1.388889 


-0.277778 


-0.277778 


1.944444 


n= 100 


0.428571 


-0.571429 


1.571429 


-1.428571 


-0.500000 


-0.629630 


0.851852 


0.277778 




0.22449U 


—0.163265 


—0.142857 


0.081633 




-0.356819 


0.118940 


0.118940 


0.118940 


n = 500 


1.000346 


-1.529941 


2.000692 


-1.471097 


-0.396722 


-0.252459 


-0.432787 


1.081969 




-0.372678 


-0.656623 


0.301691 


0.727609 




-1.956855 


-0.704468 


1.800307 


0.861016 


n = 1000 


0.866102 


-1.893338 


0.926527 


0.100710 


-0.119582 


0.026574 


-0.637770 


0.730779 




().()44()()8 


-0.792141 


().00()28(i 


0.74184G 




-3.185526 


1.503569 


0.891948 


0.790009 


n = 10, 000 


0.154021 


-0.091273 


0.992584 


-1.055333 


-0.764507 


-0.681914 


0.758153 


0.688267 




-0.028548 


-1.158239 


0.261009 


0.925773 



Example: In this example we explore the asymptotic behavior of P„ and P„ as 
n ^ oo. The matrix given in (8) is the true transition probability matrix of the 
system. Single samples of size 50, 100, 500, 1000 and 10,000 were generated based 
on P. For each sample, the estimators P„ and P„ were computed. The matrices 
^/n(Pn — P) and ^/n{Pn — P) were then calculated. The results are listed in Tables 
3 and 4. 

The matrices listed in Tables 3 and 4 indicate that for each i,i = l,...,d, 
[\/n{Pn — P)]ij and [^/n{Pn — P)]ij remain bounded as n ^ oo and that they are 
of the same order of magnitude. In fact, simulations up to n = 1, 000, 000 indicate 
exactly the same result. This example demonstrates by direct computation that 
P„ - P = O(n-0-5) and P„ - P = 0{n-°-^) 



8. Simulation Study Structure 



The goal of this simulation study is to perform a quantitative comparison be- 
tween the performance of the bootstrap method based on the maximum likelihood 
estimator and its smoothed version. The true transition probability matrix P is 
known. To ensure that the structure of P does not unduly influence the results, 
two different transition probability matrices were used: 



Pi = 



4_ _3_ 

I 1 

1 I 

10 10 
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Table 4. Asymptotic Behavior of y/n{Pn — P) 
Size v^(P„ - P) 



Sample 



n = 50 



n= 100 



n = 500 



n = 1000 



n = 10, 000 



-0.225814 
0.576773 
0.285145 
0.496126 

-0.992063 
0.734694 
0.214286 
0.588921 



-1.737124 

1.301476 

0.604016 
0.571()94 



-0.225814 

0.514849 
-0.068409 
-0.022803 
-0.198413 
-0.265306 
-0.021164 

0.026239 



-0.625364 

-1.503197 
0.556217 
-0.525()51 



-0.225814 
-0.775516 
0.626403 
0.210981 
-0.198413 
1.265306 
1.037037 
-0.244898 
0.100892 
1.866757 
0.141840 
0.086260 
1.598154 
1.000032 
-0.033529 
-0.171902 
0.857643 
1.146716 
1.305917 
0.058663 



0.677441 

-0.316107 
-0.843139 
-0.684304 
1.388889 
-1.734694 
-1.230159 
-0.370262 
0.100892 
-2.096130 
-0.778728 
0.108245 
0.764335 
-0.798310 
-1.126703 
0.12o919 
0.759625 
-1.976280 
-1.261279 
0.313245 



-3.063005 1.445740 

0.725021 0.104546 

0.034128 -0.078763 

0.549473 -0.921383 



-0.302675 0.100892 

1.357508 -1.128134 

0.342084 0.294804 

0.192828 -0.387335 



For both of the true transition probabihty matrices, simulations were conducted 
for all the combinations of parameters n = 25, 50, 100 and u = 0.5, 1.0, 2.0 and oo. 
Note that m = oo corresponds to the standard bootstrap. 

Each simulation consists of the following steps: 

(1) A single chain of size n is generated from the true transition probability 
matrix. Estimators P and P are computed. 

(2) The bootstrap method (as described before) is applied, using P and P, 
respectively. The number of bootstrap resamples generated is B = 5000. 

(3) Bootstrap 90% confidence intervals for the elements Pn and P12 are com- 
puted, based on P and P, respectively using the bootstrap percentile 
method outlined previously. 

(4) Steps 1-3 are repeated 1000 times and the observed coverage properties of 
the intervals from the two estimators are compared. 

9. Simulation Results and Conclusion 

The results of the small simulation study are presented in Table 5 and seem to 
indicate that: 

(1) For almost all combinations of simulation parameters n and w, the coverage 
performance of the confidence intervals based on P is better than for the 
intervals based on P. 
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Table 5. The empirical coverage of the standard (u = oo) and 

smoothed bootstrap percentile method confidence intervals for the 
entries Pn and P12 of Pi and Pu. The specified nominal coverage 
is 90%. 







Pi 




Pu 




n 


u 


Pn 


P12 


Pn 


P12 


25 


0.5 


90.6 


90.6 


99.6 


99.6 


25 


1 


86.2 


86.2 


99.3 


99.3 


25 


2 


81.6 


81.6 


53.0 


53.0 


25 


00 


81.5 


85.4 


53.0 


85.6 


50 


0.5 


93.1 


93.1 


97.8 


97.8 


50 


1 


86.8 


86.8 


79.3 


79.3 


50 


2 


85.4 


85.4 


79.6 


79.6 


50 


00 


85.3 


88.6 


79.5 


89.2 


100 


0.5 


92.0 


92.9 


94.2 


94.2 


100 


1 


88.1 


88.1 


89.3 


89.3 


100 


2 


87.1 


87.1 


82.7 


82.7 


100 


00 


87.0 


89.1 


82.4 


90.2 



(2) At fixed chain length, n, increasing the smoothing parameter u leads to 
narrower confidence intervals, with lower coverage performance. 

(3) Increasing the chain length leads to better coverage performance of the 
standard confidcnc;c intervals. The effect this variation has on the coverage 
performance of the smoothed intervals (at fixed u) is inconclusive. 

(4) Overall, it appears that the best coverage performance (always higher than 
the nominal value 90%) corresponds to the smallest value allowed for the 
u, u = 0.5. 
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